A Tiling Perspective for Register Optimization
نویسندگان
چکیده
Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of algorithms have been proposed in the past for register allocation, but the complexity of the problem has resulted in a decoupling of several important aspects, including loop unrolling, register promotion, and instruction reordering. In this paper, we develop an approach to register allocation and promotion in a unified optimization framework that simultaneously considers the impact of loop unrolling and instruction scheduling. This is done via a novel instruction tiling approach where instructions within a loop are represented along one dimension and innermost loop iterations along the other dimension. By exploiting the regularity along the loop dimension, and imposing essential dependence based constraints on intratile execution order, the problem of optimizing register pressure is cast in a constraint programming formalism. Experimental results are provided from thousands of innermost loops extracted from the SPEC benchmarks, demonstrating improvements over the current state-of-the-art. Key-words: compilation, compiler optimisation, register allocation,register spilling, register promotion, scheduling, constraint programming, loop transformations, loop unrolling, tiling, register tiling, locality ∗ Inria † Inria ‡ OSU § Inria Perspective de tuillage pour l’optimisation de registres. Résumé : L’allocation de registres est un problème largement étudié. Un contexte particulièrement important pour l’optimisation de l’allocation de registres est celui des boucles car elles constituent une fraction importante du temps d’exécution du programme. De nombreux algorithmes d’allocation de registres ont été proposés dans le passé mais la complexité du problème à donné lieu à un découplage de plusieurs aspects importants, incluant notamment le déroulage de boucles, la promotion de registres ou le réordonnance d’instructions. Dans ce rapport nous développons une approche unifiée au problème d’allocation et promotion de registres dans un cadre d’optimisation qui combine l’impact du déroulage de boucles et le réordonnancement d’instructions. Ceci est réalisé grâce à une nouvelle approche de pavageregistres dans lequel les instructions du corps de boucle sont représentées le long d’une dimension et les itérations de la boucle interne le long d’une autre dimension. En profitant de régularités le long d’une dimension et en imposant à l’ordre intra-tuile les contraintes de dépendances, le problème d’optimisation de la pression registres est exprimée dans un formalisme de programmation par contraintes. Les résultats expérimentaux issus de milliers de boucles internes extraites de la suite de benchmarks SPEC, démontrent l’amélioration par rapport à l’état de l’art. Mots-clés : compilation, optimisation de compilation, allocation de registres, vidage en mémoire, promotion de registres, ordonnancement, programmation par contraintes, transformation de boucles, déroulage de boucle, découpage de boucles, pavage, localité A Tiling Perspective for Register Optimization 3
منابع مشابه
Performance Evaluation of Tiling for the Register Level
Tiling is a well-known loop transformation, which is basically used to expose coarse-grain parallelism and to exploit data reuse at the cache level. However, it can also be used to exploit data reuse at the register level and to improve programs's ILP. Previous work on tiling and also commercial compilers are able to perform tiling for the register level in more than one dimension when the iter...
متن کاملA Quantitative Algorithm for Data Locality Optimization
In this paper, we consider the problem of optimizing register allocation and cache behavior for loop array references. We exploit techniques developed initially for data locality estimation and improvement in the framework of cache or local memories. First we review the concept of \reference window" that serves as our basic tool for both data locality evaluation and management. Then we study ho...
متن کاملA Compiler Perspective on Architectural Evolutions
Certain architectural features either constrain or inhibit compiler optimizations. We suggest three hardware changes aimed to improve the situation, from a compiler’s perspective. These changes involve redesigns of translation lookaside buffers, communication in memory hierarchies, and page mapping hardware for caches. Keywords— cache, compiler, optimization, PlayDoh, prefetch, tiling, TLB
متن کاملHierarchical tiling for improved superscalar performance
It takes more than a good algorithm to achieve high performance: inner-loop performance and data locality are also important. Tiling is a well-known method for parallelization and for improving data locality. However, tiling has the potential of being even more beneecial. At the nest granularity, it can be used to guide register allocation and instruction scheduling; at the coarsest level, it c...
متن کاملPrimeTile: A Parametric Multi-Level Tiler for Imperfect Loop Nests
Tiling is a crucial loop transformation for generating high performance code on modern architectures. Efficient generation of multi-level tiled code is essential for maximizing data reuse in systems with deep memory hierarchies. Tiled loops with parametric tile sizes (not compile-time constants) facilitate runtime feedback and dynamic optimizations used in iterative compilation and automatic tu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1406.0582 شماره
صفحات -
تاریخ انتشار 2014